Skip to content

Conversation

@francoishernandez
Copy link
Member

@francoishernandez francoishernandez commented Jun 4, 2025

⚠️ branch based on #238

This is a follow-up on #240.

This is not ready to merge, but should be a good starting point to start adapting the structure to support image generation.
The Bagel model is not super clean and has quite a few specific modules which make it difficult to rationalize, but it is IMO a good candidate to explore new modalities (image generation, thinking, etc.)

This also allows to test simple BNB quantization, allowing to fit the whole model on a 24GB GPU (unlike the official code which offloads parts to CPU). For reference, without any optimization, image generation runs at approx. 3 seconds per timestep on a 3090 (+ 5950x cpu) -- 30-50 timesteps being the sweet spot it seems.

What works

  • simple vision understanding query (e.g. GDP image + prompt) -> test_bagel_understanding.py
  • simple image generation -> test_bagel_generation.py

What needs to be fixed/rationalized

  • positions handling currently breaks other vision models, we need to find a proper condition (maybe split out classes as previously discussed)
  • image autoencoder settings are hardcoded/copy-pasted from the official code
  • some settings are not properly grabbed from config yet
  • image transform logic could probably be factorized a bit with current logic
  • the image generation codepath triggers an early exit in inference.decode_and_generate, we should probably find a cleaner way to support this (+ support in serving mode)

What needs to be implemented/tested

  • image "edition" use case (+ image cfg and co)
  • "thinking" step support (might be useful for other models as well)
  • multiple image handling (understanding path, probably not supported in generation path yet)
  • batch mode

@francoishernandez francoishernandez added enhancement New feature or request recipes labels Jun 4, 2025
@francoishernandez francoishernandez force-pushed the bagel branch 3 times, most recently from 988602c to 9b87f54 Compare June 4, 2025 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request recipes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant